Normalization of spectro-temporal Gabor filter bank features for improved robust automatic speech recognition systems
نویسندگان
چکیده
Physiologically motivated feature extraction methods based on 2D-Gabor filters have already been used successfully in robust automatic speech recognition (ASR) systems. Recently it was shown that a Mel Frequency Cepstral Coefficients (MFCC) baseline can be improved with physiologically motivated features extracted by a 2D-Gabor filter bank (GBFB). Besides physiologically inspired approaches to improve ASR systems technical ones, such as mean and variance normalization (MVN) or histogram equalization (HEQ), exist which aim to reduce undesired information from the speech representation by normalization. In this study we combine the physiologically inspired GBFB features with MVN and HEQ in comparison to MFCC features. Additionaly, MVN is applied at different stages of MFCC feature extraction in order to evaluate its effect to spectral, temporal or spectro-temporal patterns. We find that MVN/HEQ dramatically improve the robustness of MFCC and GBFB features on the Aurora 2 ASR task. While normalized MFCCs perform best with clean condition training, normalized GBFBs improve the ETSI MFCCs features with multi-condition training by 48%, outperforming the ETSI advanced front-end (AFE). The MVN, which may be interpreted as a normalization of modulation depth works best when applied to spectro-temporal patterns. HEQ was not found to perform better than MVN.
منابع مشابه
Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition.
To test if simultaneous spectral and temporal processing is required to extract robust features for automatic speech recognition (ASR), the robust spectro-temporal two-dimensional-Gabor filter bank (GBFB) front-end from Schädler, Meyer, and Kollmeier [J. Acoust. Soc. Am. 131, 4134-4151 (2012)] was de-composed into a spectral one-dimensional-Gabor filter bank and a temporal one-dimensional-Gabor...
متن کاملHooking up spectro-temporal filters with auditory-inspired representations for robust automatic speech recognition
Spectro-temporal filtering has been shown to result in features that can help to increase the robustness of automatic speech recognition (ASR) in the past. We replace the spectro-temporal representation used in previous work with spectrograms that incorporate knowledge about the signal processing of the human auditory system and which are derived from Power-Normalized Cepstral Coefficients (PNC...
متن کاملSpectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition.
In an attempt to increase the robustness of automatic speech recognition (ASR) systems, a feature extraction scheme is proposed that takes spectro-temporal modulation frequencies (MF) into account. This physiologically inspired approach uses a two-dimensional filter bank based on Gabor filters, which limits the redundant information between feature components, and also results in physically int...
متن کاملInformative spectro-temporal bottleneck features for noise-robust speech recognition
Spectro-temporal Gabor features based on auditory knowledge have improved word accuracy for automatic speech recognition in the presence of noise. In previous work, we generated robust spectro-temporal features that incorporated the power normalized cepstral coefficient (PNCC) algorithm. The corresponding power normalized spectrum (PNS) is then processed by many Gabor filters, yielding a high d...
متن کاملSpectro-temporal Gabor features as a front end for automatic speech recognition
A novel type of feature extraction is introduced to be used as a front end for automatic speech recognition (ASR). Two-dimensional Gabor filter functions are applied to a spectro-temporal representation formed by columns of primary feature vectors. The filter shape is motivated by recent findings in neurophysiology and psychoacoustics which revealed sensitivity towards complex spectro-temporal ...
متن کامل